Improved Deep Embedded Clustering with Local Structure Preservation
نویسندگان
چکیده
Deep clustering learns deep feature representations that favor clustering task using neural networks. Some pioneering work proposes to simultaneously learn embedded features and perform clustering by explicitly defining a clustering oriented loss. Though promising performance has been demonstrated in various applications, we observe that a vital ingredient has been overlooked by these work that the defined clustering loss may corrupt feature space, which leads to non-representative meaningless features and this in turn hurts clustering performance. To address this issue, in this paper, we propose the Improved Deep Embedded Clustering (IDEC) algorithm to take care of data structure preservation. Specifically, we manipulate feature space to scatter data points using a clustering loss as guidance. To constrain the manipulation and maintain the local structure of data generating distribution, an under-complete autoencoder is applied. By integrating the clustering loss and autoencoder’s reconstruction loss, IDEC can jointly optimize cluster labels assignment and learn features that are suitable for clustering with local structure preservation. The resultant optimization problem can be effectively solved by mini-batch stochastic gradient descent and backpropagation. Experiments on image and text datasets empirically validate the importance of local structure preservation and the effectiveness of our algorithm.
منابع مشابه
Solving Data Clustering Problems using Chaos Embedded Cat Swarm Optimization
In this paper, a new method is proposed for solving the data clustering problem using Cat Swarm Optimization (CSO) algorithm based on chaotic behavior. The problem of data clustering is an important section in the field of the data mining, which has always been noted by researchers and experts in data mining for its numerous applications in solving real-world problems. The CSO algorithm is one ...
متن کاملSolving Data Clustering Problems using Chaos Embedded Cat Swarm Optimization
In this paper, a new method is proposed for solving the data clustering problem using Cat Swarm Optimization (CSO) algorithm based on chaotic behavior. The problem of data clustering is an important section in the field of the data mining, which has always been noted by researchers and experts in data mining for its numerous applications in solving real-world problems. The CSO algorithm is one ...
متن کاملMultimode Image Clustering Using Optimal Image Descriptor
Manifold learning based image clustering models are usually employed at local level to deal with images sampled from nonlinear manifold. Multimode patterns in image data matrices can vary from nominal to significant due to images with different expressions, pose, illumination, or occlusion variations. We show that manifold learning based image clustering models are unable to achieve well separa...
متن کاملDeep Transductive Semi-supervised Maximum Margin Clustering
Semi-supervised clustering is an very important topic in machine learning and computer vision. The key challenge of this problem is how to learn a metric, such that the instances sharing the same label are more likely close to each other on the embedded space. However, little attention has been paid to learn better representations when the data lie on non-linear manifold. Fortunately, deep lear...
متن کاملMode region detection using improved Competitive Hebbian Learning for unsupervised clustering
The goal of this paper is to propose an improved competitive Hebbian learning for mode detection using a new activation function, to overcome its sensitivity to local irregularities in pattern distribution. This method is involved with an unsupervised clustering approach divided into four processing stages. It begins by the estimation of the probability density function, followed by a competiti...
متن کامل